Skip to content

feat(memory): Hermes V3 long-term memory — W1 + W2 (sqlite_vec + import + read + memdebug)#2

Merged
liyoungc merged 11 commits intomainfrom
feat/memory-sqlite-vec-w1
May 2, 2026
Merged

feat(memory): Hermes V3 long-term memory — W1 + W2 (sqlite_vec + import + read + memdebug)#2
liyoungc merged 11 commits intomainfrom
feat/memory-sqlite-vec-w1

Conversation

@liyoungc
Copy link
Copy Markdown
Owner

@liyoungc liyoungc commented May 2, 2026

Implements W1 + W2 + W3 + W4 (prepped) of the Hermes V3 long-term memory design.

W1 — schema bootstrap (done)

plugins/memory/sqlite_vec/ registering as a MemoryProvider plugin: episodes (hot tier) + semantic_facts (cold tier) + vec_facts (vec0 virtual table) + 3 sync triggers.

W2-1 — read path + embedding wrapper (done)

  • embed.py: async voyage_embed() (httpx, 128 batch, 3× backoff retry, locked dim/dtype 512/int8).
  • read.py: Fact dataclass + async read_memory() (vec0 prefilter k=50, SQL CTE rerank 0.7*sim + 0.3*exp(-age/90)), p95 logged. bump_hits() fire-and-forget. format_facts_for_prompt() with with_meta flag.

W2-2 — MEMORY.md import (done)

  • scripts/import_md.py: parses Topic: content §, slugifies hierarchy, preserves CJK, idempotent, atomic, --dry-run / --commit.
  • 25 entries imported live on chococlaw; cosine retrieval verified.

W2-3 — wire prefetch + sync_turn (done, live on chococlaw)

  • Provider prefetch() runs in worker thread (5s timeout) returning the recall block.
  • check_same_thread=False + per-provider lock for cross-thread sqlite3.
  • Activation: config.yaml memory.provider: sqlite_vec (no env-var gate).

W2-4 — /memdebug slash command (done, live)

  • plugins/memdebug/ standalone plugin, registers /memdebug <q> and /memdebug rawsearch <q>. Logs invocations to memory.log.

W3-1 — kimi_extract + EXTRACT_PROMPT (done)

  • plugins/memory/sqlite_vec/extract.py: PROMPT verbatim from spec §5.2, PHI_BLACKLIST_CHANNELS short-circuit, tolerant JSON parser (handles 3 different Kimi output shapes observed in live testing).

W3-2 — write_episode + sync_turn write-back (done, live)

  • plugins/memory/sqlite_vec/write.py: per-turn write back, fast-track threshold 30d, JSONL failure log.
  • Wired into sync_turn after bump_hits via worker thread (30s timeout). msg_id synthesized via hash for idempotency.

W3-3 — weekly_promotion + weekly_apply (done, live, cron-scheduled)

  • plugins/memory/sqlite_vec/promotion.py: PROMOTION_PROMPT designed; weekly_promotion + weekly_apply async; render_digest_markdown matching spec §5.4; discord_post helper with chunking.
  • scripts/cron/weekly_promotion.py + weekly_apply.py thin wrappers (deployed to ~/.hermes/scripts/).
  • Cron entries added: 0 19 * * 6 (Sun 03:00 UTC+8) + 0 19 * * 0 (Mon 03:00 UTC+8).
  • Auto-fallback observed: Kimi-K2-Thinking 404s on synthetic.new → falls back to K2.5.

W3-4 — /memreview reject + /mem kill switch (done, live)

  • plugins/memreview/: /memreview reject <digest_id> writes sentinel; /mem off|on|status toggles MEM_OFF global kill switch.
  • MEM_OFF check wired into both sync_turn (skip write_episode) and weekly_promotion (skip Kimi call). Read path unaffected.

W4-1 — cutover prep (prepped, awaiting soak)

  • scripts/cutover/cutover.sh: idempotent bash script, dry-run by default. Archives MEMORY.md, disables legacy crons, smoke tests, restarts gateway.
  • Not executed — acceptance criteria require 1 full day soak + 1 weekly review cycle observed. User runs when ready (target 2026-05-24).

W4-2 — runbooks (done, in hermes-memory repo)

W1 schema fixes bundled across W2-W3

  1. vec_facts was FLOAT[512] → changed to int8[512] (W2-1). vec0 INSERT requires vec_int8(blob) wrapper; UPDATE rejected on int8 even with wrapper, so trigger rewritten as DELETE+INSERT.
  2. vec0 default L2 distance breaks the rerank formula on int8; added distance_metric=cosine (W2-2).
  3. LOG_PATH = Path.home() resolves to /home/hermes inside container (not the /opt/data mount); switched to hermes_constants.get_hermes_home() (W2-4).
  4. SQLite connection thread-safety: check_same_thread=False + per-provider lock (W2-3).
  5. _apply_diff_atomic was holding BEGIN open across Voyage HTTP — embed BEFORE BEGIN (W3-3).

End-to-end live verification (chococlaw)

Path Result
W2-1 retrieval × 3 queries top-1 semantically correct, sim ∈ [0.43, 0.61]
W2-2 import 25 facts one Voyage batch, idempotent re-run
W2-3 in-process smoke MemoryManager.prefetch_all returns full markdown block
W2-4 /memdebug × 3 help / semantic / rawsearch all working
W3-1 kimi_extract × 4 pleasantry/long-lived/PHI ✓; short-lived ⚠ (spec prompt issue, see issue NousResearch#8)
W3-2 write_episode end-to-end 2 episodes + 1 fast-tracked fact, Kimi correctly inferred valid_to=2026-05-11
W3-3 promotion + apply 4 fixture episodes → Kimi diff (2 promote, 1 noise) → applied, semantic_facts 25→27→25 cleanup
W3-3 Discord post digest rendered correctly to #memory-review (channel 1483958144596967464)
W3-4 reject + apply sentinel written → archived as .rejected.json, semantic_facts unchanged

Tests

522/522 green across the W1-W3 surface in container:

docker exec -w /opt/hermes hermes /opt/hermes/.venv/bin/python3 -m pytest \
  tests/plugins/ tests/scripts/test_import_md.py -q
522 passed, 2 warnings in 8.47s

Including: W1 schema (7), W2-1 read path (10), W2-2 import_md (12), W2-3 prefetch wiring (6), W2-4 memdebug (10), W3-1 extract (22), W3-2 write_episode (11), W3-3 promotion (17), W3-4 memreview (15) = 110 new tests plus existing sibling-plugin coverage that we did not regress.

Spec references

Notes for review

  • Personal fork on chococlaw; not for upstream NousResearch.
  • Free-tier Voyage 3 RPM until a payment method is added (200M token allowance unchanged).
  • Kimi-K2-Thinking unavailable on synthetic.new at the time of writing — auto-fallback to K2.5 produces acceptable promotion diffs.
  • open_db / init_db gained a keyword-only check_same_thread param (default True; only the provider passes False) — backwards-compatible.

liyoungc and others added 5 commits May 2, 2026 08:07
Introduces a new MemoryProvider plugin implementing Hermes V3 long-term
memory design (two-tier: hot episodes + cold curated semantic_facts,
weekly human-approved promotion).

W1 scope is schema only — no read or write path yet:
  - plugins/memory/sqlite_vec/{__init__,store,plugin.yaml,schema.sql}
  - episodes table (hot raw turn record, channel-scoped idempotent)
  - semantic_facts table (cold curated, with valid_from/valid_to validity
    windows borrowed from the MemPalace temporal-triple pattern)
  - vec_facts vec0 virtual table (512-dim float32) + 3 sync triggers
  - SqliteVecMemoryProvider class registers with MemoryProvider ABC
    but prefetch/sync_turn are no-ops until W2/W3 wire them.

Tests (7/7 passing inside running hermes container):
  - bootstrap creates all expected tables/indexes/triggers
  - bootstrap is idempotent
  - semantic_facts column defaults populate (created_at, valid_from)
  - role CHECK constraint rejects values other than user/assistant
  - triggers keep vec_facts in sync on insert/update/delete
  - vec0 MATCH+k returns nearest neighbour
  - provider lifecycle round-trips

Activates via $HERMES_HOME/config.yaml memory.provider: sqlite_vec
(deferred; W4 cutover only).

Refs liyoungc/hermes-memory#2 (W1-1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the read path for the sqlite_vec memory plugin per
docs/superpowers/specs/2026-05-02-hermes-memory-design.md §4.

embed.py: async voyage_embed() with httpx.AsyncClient, 128-text batching,
3x exponential-backoff retry on 5xx, fail-loud on missing VOYAGE_API_KEY
or 4xx. dim/dtype locked to spec values (512/int8) so config drift
fails fast.

read.py: Fact dataclass + async read_memory() using vec0 prefilter (k=50)
and the SQL CTE rerank with locked weights 0.7*sim + 0.3*recency
(90-day half-life). bump_hits() is fire-and-forget UPDATE that swallows
sqlite errors with a warning. p95 latency logged as JSON line to
~/.hermes/logs/memory.log.

W1 schema fix: vec_facts changed from FLOAT[512] to int8[512] to match
spec §1.4 (Voyage 3.5-lite, 512-dim, int8). vec0 int8 columns require
the vec_int8() SQL wrapper on INSERT, and reject UPDATE entirely even
with the wrapper, so sf_after_update_embedding now does DELETE+INSERT.

Tests: 10 new cases (mock httpx for voyage success/batching/5xx-retry/
4xx/missing-key/empty-input; read_memory orders by score and filters
expired; bump_hits increments and swallows errors; format_facts shape).
17/17 green.

Refs liyoungc/hermes-memory#4
scripts/import_md.py seeds semantic_facts from ~/.hermes/memories/MEMORY.md
per spec §6.1. Each "Topic: content §"-delimited entry maps to one
semantic_fact row with entity prefix "禮揚." plus a slug of the topic,
importance=2, valid_from=2026-05-10, valid_to=NULL. Hierarchical topics
like "Tools & Access > ProtonMail" become entity
"禮揚.tools_access.protonmail" so prefix queries still work.

Embeds in batches of 128 via Voyage 3.5-lite. Idempotent: pre-INSERT
(entity, fact) lookup skips duplicates so re-runs are safe. Wraps the
batch-insert in BEGIN/COMMIT and rolls back on embed failure so partial
imports never land. Supports --dry-run for preview and --commit for the
real write.

W1 schema fix bundled: vec0 column now declares distance_metric=cosine.
Without this, the default L2 distance on int8 vectors produces sim
values in the hundreds, breaking the 0.7*sim + 0.3*recency rerank
formula entirely. Verified end-to-end on chococlaw:

  Q: "我太太生日"          -> top hit "**生日**: 3/19"   sim=0.604  OK
  Q: "AI as digital twin" -> top hit "Think of AI as a digital twin"
                                                       sim=0.607  OK

Tests: 12 new cases for import_md (slugify simple/hierarchy/CJK/empty;
parse colon-missing/no-trailing-§; dry-run no-write; commit populates
vec_facts via trigger; idempotent re-run; partial update embeds only
new; rollback on embed failure leaves DB unchanged). 29/29 green
including W1 + W2-1.

Live import: 25 entries, 1 Voyage batch, all visible in semantic_facts
and vec_facts on chococlaw:/opt/data/memories/memory.db.

Refs liyoungc/hermes-memory#5
SqliteVecMemoryProvider.prefetch() now embeds the user message via
Voyage 3.5-lite, runs read_memory() (vec0 prefilter k=50, SQL CTE
rerank with cosine sim + 90-day half-life), and returns a markdown
block:

    ## Recent relevant memories
    - [entity.slug] fact text (importance: N, age: D days)
    ...

Activation is via config.yaml (memory.provider: sqlite_vec) — no env
var gate. Per spec §4 the persona files (SOUL.md, USER.md,
life-dimensions.md) stay in flat-file injection above this block; the
gateway's existing prompt assembler handles ordering.

Hits accounting (spec §4): retrieved fact IDs are stashed per
session_id. sync_turn() runs bump_hits() on the cached IDs *after* the
reply is delivered, so the UPDATE never sits on the user-facing
latency path. Errors are swallowed.

Async-in-sync bridge: the ABC's prefetch/sync_turn are sync, but the
gateway already owns the asyncio loop, so asyncio.run inline raises.
Solution is a worker thread with its own event loop and a 5s timeout
kill-switch. To make sqlite3 cross-thread access legal, the connection
opens with check_same_thread=False and self._lock serializes both
read_memory and bump_hits. open_db()/init_db() now take a keyword-only
check_same_thread param (default True; provider passes False).

format_facts_for_prompt() gained a with_meta=True flag that appends
"(importance: N, age: D days)" per fact, used by prefetch. /memdebug
will keep the compact (with_meta=False) form.

Tests: 6 new cases (markdown header, empty/trivial query no-op,
voyage error swallow, sync_turn bumps then clears cache, worker
timeout, with_meta format). 35/35 green including W1, W2-1, W2-2.

Live activation verified on chococlaw:

  config.yaml memory.provider: '' -> sqlite_vec
  docker compose restart gateway
  Memory provider 'sqlite_vec' registered (0 tools)
  sqlite_vec memory ready at /opt/data/memories/memory.db

End-to-end via MemoryManager.prefetch_all() against the real DB:
"我太太生日" returns the full 8-fact markdown block top-1 = "**生日**: 3/19".

Refs liyoungc/hermes-memory#6
plugins/memdebug/ is a standalone plugin that registers the /memdebug
slash command via the hermes-agent ctx.register_command() surface.
Memory plugins live in plugins/memory/ and load through the exclusive
loader, which doesn't pass through the slash-command registry — keeping
/memdebug separate is the cleanest split.

Behaviour (spec §7.2):

  /memdebug                  -> short usage help
  /memdebug <query>          -> top-8 from semantic_facts with
                                score + sim + age + importance breakdown
  /memdebug rawsearch <query> -> substring scan of episodes (forensics)

Each invocation logs to ~/.hermes/logs/memory.log as a JSON line so the
F2 monitoring path (% top-1 hits judged useful) can aggregate weekly.

Reaction logging deferred: the issue acceptance criterion calls for
👍/👎 reaction prompts on the embed message, but Discord-native rich
embeds + reaction collectors require gateway-side plumbing
(gateway/platforms/discord.py) that the spec §8 marks as iterate-after-W2
work. v1 emits a textual "React 👍/👎 to flag this retrieval." cue and
relies on manual user reactions for now.

LOG_PATH bug fix bundled: both this plugin and plugins/memory/sqlite_vec/
were resolving the log path via Path.home(), which inside the hermes
container resolves to /home/hermes — not the /opt/data mount. Switched
to hermes_constants.get_hermes_home() so logs land in the mounted
~/.hermes/logs/memory.log on the host. Confirmed live:

  $ tail -2 ~/.hermes/logs/memory.log
  {"ts": "2026-05-02T13:06:17", "q": "今晚晚餐", "k": 8, "n": 8, "sql_ms": 2.81}
  {"ts": "2026-05-02T13:06:17", "cmd": "memdebug", "q": "今晚晚餐", "n": 8, "ids": [...]}

Also fixed a Python default-arg gotcha: _open_memory_db(path=DEFAULT_DB)
bound DEFAULT_DB at def-time so monkeypatching the module global didn't
take effect. Switched to lazy lookup (path = path or DEFAULT_DB).

Tests: 10 new for memdebug (truncate, help/empty/rawsearch-no-arg,
semantic with score breakdown, db-missing friendly message,
rawsearch finds substring, rawsearch empty, sync entry-point dispatch,
register() wires the right name + handler shape). 45/45 green
including W1, W2-1, W2-2, W2-3.

Live verification on chococlaw:

  /memdebug              -> help text
  /memdebug 我太太生日   -> top-1 = "**生日**: 3/19" (sim=0.604)
  /memdebug rawsearch 致妤 -> "Episodes are written by W3" (placeholder)

Refs liyoungc/hermes-memory#7
@liyoungc liyoungc changed the title feat(memory): bootstrap sqlite_vec plugin schema (W1) feat(memory): Hermes V3 long-term memory — W1 + W2 (sqlite_vec + import + read + memdebug) May 2, 2026
liyoungc added 6 commits May 2, 2026 13:22
plugins/memory/sqlite_vec/extract.py implements the per-turn
extraction stage of the write path.

EXTRACT_PROMPT is a verbatim copy of spec §5.2 (HARD RULES 1-4 +
JSON shape contract); paraphrasing here would compromise the F2
monitoring contract that downstream weekly review depends on.

PHI_BLACKLIST_CHANNELS = {"cmio", "cbme", "medicine"} short-circuits
to [] before any network call so hospital data never round-trips
through synthetic.new.

kimi_extract(user, assistant, channel, ts) calls Kimi K2.5 via
synthetic.new's OpenAI-compatible endpoint with temperature=0.1,
response_format=json_object, max_tokens=1024. Token usage is logged
to ~/.hermes/logs/memory.log so weekly review can spot a runaway
extract budget.

JSON parser is intentionally tolerant: in live testing Kimi K2.5
returned three different shapes for the same prompt at temperature=0.1:
  1. bare list           [{...}]
  2. wrapped object      {"analysis": "...", "extracted_memories": [...]}
  3. flat single fact    {"type":"episodic","text":"...","entity":...}
_parse_json_list() handles all three, falls back to the first
list-valued field, and detects single-fact dicts by canonical key
presence.

Credential resolution: SYNTHETIC_API_KEY env var first (test override),
then auth.json's credential_pool["custom:synthetic"] (canonical key on
chococlaw). Older / alternate layouts (credential_pools, top-level)
also accepted for resilience.

Coercion drops malformed rows (bad type / blank text / unparseable
importance), clamps importance to 1-5, and validates entity / valid_to_hint
types. Only well-formed facts reach the caller.

Tests: 22 cases (prompt verbatim assertions, PHI blacklist (3),
parser shapes (5), coercion (3), short-circuits (2), mocked
synthetic.new full flow (5), error paths (2), auth.json round-trip).
213/213 green across all memory + scripts tests.

Live smoke test on chococlaw against real synthetic.new + Kimi K2.5:

  pleasantry  ("好的")        -> 0 facts ✓
  long-lived  ("追 sleep RCT") -> 1 fact (semantic, 禮揚.研究興趣) ✓
  phi-channel ("cmio")        -> 0 facts (short-circuit) ✓
  short-lived ("致妤 7:30")    -> 0 facts ⚠ (Kimi judges "about 致妤,
                                             not about 禮揚")

The short-lived miss is a spec-level prompt issue, not an extract.py
bug — the prompt says "memories about 禮揚" and Kimi reads that
strictly. Spec §4.1's B1 acceptance example expects this turn to
extract; matching B1 will require a spec edit (e.g. clarifying
"about 禮揚 includes 禮揚's life context"). W3-3 weekly_promotion
runs a separate thinking-mode Kimi pass over a week of episodes,
which is the spec's intended catch for hot-path misses.

Refs liyoungc/hermes-memory#8
plugins/memory/sqlite_vec/write.py implements the per-turn write-back
half of the memory system per spec §5.1.

Hot-path flow:

  1. PHI gate — channel in PHI_BLACKLIST_CHANNELS short-circuits
     extract (raw episode rows still land; the LLM never sees PHI).
  2. kimi_extract returns ExtractedFact list (or [] on failure;
     non-fatal — raw turn is still recorded so weekly_promotion can
     re-extract later).
  3. voyage_embed batches the user msg, reply, and every fact text
     in one Voyage call. Empty strings are filtered out so we don't
     waste a Voyage slot.
  4. INSERT 2 rows into episodes (user, assistant) inside a single
     BEGIN/COMMIT, with ON CONFLICT(channel, external_id) DO NOTHING
     for idempotent Discord redelivery / cron-retry / restart-replay.
  5. Per-fact partition into fast-track vs stash:
       * valid_to_hint parses to <= today + 30 days  -> INSERT
         into semantic_facts directly (the trigger mirrors into
         vec_facts so the next turn's prefetch can retrieve it).
       * everything else -> JSON-stash in episodes.metadata.stashed_facts
         for W3-3 weekly_promotion.
  6. Any exception -> rollback + append the turn (raw text, ts,
     channel, msg_id, error) to ~/.hermes/logs/memory_write_failures.jsonl.
     The reply was already sent; we never propagate the error.

Threshold rationale (spec §5.3): raised from the original 7d to 30d so
short-lived facts ("下週會去日本玩五天") don't sit in metadata for a
week before the next Sunday review fires.

Provider wiring (plugins/memory/sqlite_vec/__init__.py):

  sync_turn() now schedules two worker-thread coroutines after the
  reply lands: bump_hits (5s budget) and write_episode (30s budget).
  The thread reuses self._lock so cross-thread sqlite3 access remains
  serialized. msg_id is synthesized by hashing
  (session_id, user, assistant, ts-to-the-minute) so Discord
  redeliveries within the same minute collapse via ON CONFLICT.

No env-var gate (matches W2-3): activation is the same
config.yaml memory.provider: sqlite_vec. Rolling back the write path
specifically would require code change (or temporarily clearing the
provider config), but the hot-path failure mode is a JSONL log entry,
not a stalled reply, so the rollback risk is low.

Tests: 11 new (parse_valid_to_hint edge cases, fast-track threshold
edge / interior / over / null, two episode rows per turn, PHI skips
extract but records, idempotent dup msg_id, short-lived fast-tracks +
mirrors to vec_facts, long-lived stashes in metadata, mixed
partition, embed failure -> JSONL + rollback, extract failure still
records raw, empty turn no embed call). 205/205 green across all
memory + memdebug + import tests.

Live verification on chococlaw:

  Turn A: "今晚致妤大概 7:30 才到家" / "了解"
    -> 2 episodes, 0 facts (Kimi judged "about 致妤 not 禮揚",
       same prompt-wording observation logged in W3-1)

  Turn B: "我下週會去日本玩五天" / "酷..."
    -> 2 episodes, 1 fact fast-tracked:
       (.家庭) "下週會去日本玩五天" valid_from=2026-05-02 valid_to=2026-05-11
    -> vec_facts auto-mirrored via trigger (semantic_facts 25 -> 26).
    -> Kimi correctly inferred valid_to from "下週" + "五天".

Cleanup: smoke test data deleted from production DB before commit.

Refs liyoungc/hermes-memory#9
Implements the cold-path of the memory system per spec §5.3 + §5.4.

Two scripts (entry points in ~/.hermes/scripts/):

  scripts/weekly_promotion.py - cron Sun 03:00 UTC+8 (cron expr "0 19 * * 6"
    in UTC). Reads last 7 days of pending episodes, runs one Kimi call to
    produce a promotion diff, persists the diff to
    ~/.hermes/memories/pending_diffs/wk-YYYY-MM-DD.json, renders the digest
    markdown per spec §5.4, posts it to #memory-review via raw Discord HTTP.
    Does NOT stamp episodes.promoted_at.

  scripts/weekly_apply.py - cron Mon 03:00 UTC+8 ("0 19 * * 0" UTC).
    Purges pending_diffs/*.json older than 14 days at start. Loads the
    latest pending diff. If a <digest_id>.rejected sentinel file exists
    (written by /memreview reject in W3-4), archives the diff as rejected
    and exits. Otherwise applies promote / dedup / expire atomically and
    stamps episodes.promoted_at on the candidate rows.

Both scripts emit a final stdout line {"wakeAgent": false} so the cron
framework's wake gate skips the agent run — delivery is handled inside
the script via the Discord HTTP POST helper, no LLM round-trip needed
for the cron job itself.

Core logic lives in plugins/memory/sqlite_vec/promotion.py:
  - PROMOTION_PROMPT designed to mirror EXTRACT_PROMPT style: same
    HARD RULES (PHI blacklist, pleasantry filter, synthetic handling,
    err-on-side-of-not-promoting), four explicit actions
    (PROMOTE / DEDUP_HIT / EXPIRE / DROP_AS_NOISE), and a verbatim
    output schema.
  - Per-candidate vec_search prefilter k=20 keeps the prompt small
    (only nearest-neighbor existing facts, not the whole active set,
    so prompt stays bounded as semantic_facts grows past 500 rows).
  - WeekDigest dataclass round-trips JSON, render_digest_markdown
    matches spec §5.4 layout (Promote / Dedup / Expire / Noise sections,
    emoji icons, character-truncated chunks for Discord 2000-char limit).
  - discord_post chunks long messages on newline boundaries before 1990
    chars to stay under Discord's per-message ceiling.
  - memory_review_channel_id resolves the live channel from
    ~/.hermes/channel_directory.json (which stores platforms.discord
    as a list of {id, name, guild, type} dicts on chococlaw).

Critical refactor: _apply_diff_atomic embeds promote-fact texts BEFORE
opening the BEGIN/COMMIT, then writes blobs into the transaction.
Holding the writer lock open across a Voyage HTTP round-trip would
block hot-path write_episode for the duration of the call (300ms+).

Live verification on chococlaw:

  Inserted 4 fixture episodes -> weekly_promotion -> Kimi call:
    Kimi-K2-Thinking 404'd on synthetic.new; auto-fallback to K2.5.
    Returned: 2 promote, 0 dedup, 0 expire, 1 drop_as_noise.
  weekly_apply applied diff: promoted=2 stamped=4
    semantic_facts: 25 -> 27 (then back to 25 after smoke cleanup)

  Discord post test to #memory-review (channel 1483958144596967464):
    posted=True, format renders correctly with all four sections.

Cron entries added to ~/.hermes/cron/jobs.json:
  Hermes Weekly Memory Promotion - 0 19 * * 6 (Sun 03:00 UTC+8)
  Hermes Weekly Memory Apply     - 0 19 * * 0 (Mon 03:00 UTC+8)
Both enabled, deliver=discord, script-driven (wake-gate=false).

Tests: 17 new for promotion (prompt placeholders, hard-rule presence,
candidate / neighbor formatting, digest_id format, WeekDigest round-trip,
markdown renders all 4 sections, empty-section collapse, no-candidates
short-circuit, dry-run no-write, real-run persists diff, no-pending-diff
exit, rejection sentinel archives without applying, promote inserts +
mirrors to vec_facts + stamps episodes, dedup bumps hits, expire sets
valid_to, purge_old_pending). 222/222 green across all memory + memdebug
+ import + scripts tests.

Operational notes:
- Kimi-K2-Thinking unavailable on synthetic.new (404) - we auto-fallback
  to Kimi-K2.5 with temp=0.2. Quality looks acceptable; revisit if
  promotion misses obvious dedup opportunities.
- The hot-path write_episode keeps stashing long-lived facts into
  episodes.metadata.stashed_facts, so the first real Sunday firing on
  a chocoprod week will draw from real data.

Refs liyoungc/hermes-memory#10
The hermes scheduler hard-binds ~/.hermes/scripts/ as the only exec path
for cron jobs, so the runtime copies must live there per-host. Keeping
the canonical sources in the repo means PR review can see them and a
fresh chococlaw rebuild is a 2-line cp + jobs.json patch.

Refs liyoungc/hermes-memory#10
plugins/memreview/ is a standalone slash-command plugin registering
two commands per spec §7.1:

  /memreview reject <digest_id>  - writes
    ~/.hermes/memories/pending_diffs/<digest_id>.rejected
    Monday's weekly_apply reads this sentinel and archives the diff
    without applying any of its promote / dedup / expire actions;
    candidate episodes stay unstamped for next Sunday's window.

  /memreview pending             - lists all pending digest_ids,
                                    flagging any that already carry a
                                    rejection sentinel.

  /mem off                       - global kill switch. Writes
                                    HERMES_HOME/MEM_OFF. Both
                                    SqliteVecMemoryProvider.sync_turn
                                    (hot path) and weekly_promotion
                                    (cold path) check for this file at
                                    the top of each call and short-
                                    circuit. Read path is unaffected.

  /mem on                        - removes the sentinel.

  /mem status                    - human-readable state of the kill
                                    switch + pending diff list.

Why slash commands rather than Discord reactions: spec §7.1 explicitly
chose slash because reactions don't reliably trigger webhook events
across all bot adapters — a silent kill-switch failure is worse than
no switch.

Sentinel file design rationale: file-system state (rather than in-memory
process flags) survives container restart, cross-thread visibility
without locks, and gives the user a manual recovery path
(touch / rm the file directly).

Wired into the write paths:
  - plugins/memory/sqlite_vec/__init__.py: sync_turn now checks
    _mem_off_active() before scheduling the write_episode worker.
    bump_hits still fires (it's read-side accounting).
  - plugins/memory/sqlite_vec/promotion.py: weekly_promotion checks
    mem_off_active() at the top of the function and returns a
    "skipped: /mem off active" summary without reading episodes,
    calling Kimi, or persisting any diff.

Both call sites import lazily from plugins.memreview so the memory
plugin still loads cleanly even if memreview is uninstalled.

Tests: 15 new (help text, pending list with/without rejected flag,
reject invalid/unknown/valid digest_id, /mem off+on creates/deletes
sentinel, /mem on idempotent, /mem status with and without pending,
register() wires both commands, end-to-end reject -> apply archives
without applying, /mem off short-circuits weekly_promotion before
Kimi is called). 522/522 green across all plugin tests.

Live verification on chococlaw:

  1. wrote fake pending diff wk-2026-05-02.json (with a "should NEVER land"
     promote entry).
  2. /memreview pending — listed it.
  3. /memreview reject wk-2026-05-02 — sentinel created, confirmation reply.
  4. weekly_apply — archived as wk-2026-05-02.rejected.json, sentinel
     auto-cleaned. semantic_facts unchanged (25 -> 25). The promote was
     correctly discarded.
  5. /mem off / status / on cycle — sentinel toggled at /opt/data/MEM_OFF.

Refs liyoungc/hermes-memory#11
Idempotent bash script that performs the W4 cutover steps when run
with --commit. Default invocation is dry-run.

Steps:
  1. Pre-flight (verify memory.db exists, recent episodes accumulated)
  2. Archive ~/.hermes/memories/MEMORY.md → MEMORY.md.archive-YYYY-MM-DD
     (chmod 444 for read-only)
  3. Confirm config.yaml memory.provider == sqlite_vec
  4. Disable legacy memory crons (Dimensions Memory Consolidation,
     Forgetting Curve) by flipping enabled=false in jobs.json
  5. Smoke test the new provider end-to-end
  6. Restart gateway

Spec target date 2026-05-24, after observing one successful weekly
review cycle. Caller is the user; script is non-destructive in dry-run
mode and refuses to overwrite existing archives so re-running mid-fail
is safe.

Rollback procedure documented in hermes-memory/docs/runbooks/memory-rollback.md §3.

Refs liyoungc/hermes-memory#12
@liyoungc liyoungc merged commit 8591ee2 into main May 2, 2026
3 of 7 checks passed
@liyoungc liyoungc deleted the feat/memory-sqlite-vec-w1 branch May 2, 2026 14:12
@liyoungc
Copy link
Copy Markdown
Owner Author

liyoungc commented May 2, 2026

Note (2026-05-02): The implementation introduced by this PR has been extracted to a dedicated plugin repo — see liyoungc/hermes-memory-plugin. The 25 files added here have been removed from this fork in #4. The git history of how we got here is preserved (this PR + its merge commit), but the current state of main no longer contains them. To install on a fresh hermes-agent checkout, run hermes-memory-plugin/install.sh.

liyoungc added a commit that referenced this pull request May 2, 2026
Removes the 25 implementation files merged in #2 (W1-W4-2). The code
now lives in a dedicated repo (liyoungc/hermes-memory-plugin) installed
via:

    git clone git@github.com:liyoungc/hermes-memory-plugin.git ~/Projects/hermes-memory-plugin
    ~/Projects/hermes-memory-plugin/install.sh ~/Projects/hermes-agent

The install symlinks plugins/memory/sqlite_vec, plugins/memdebug, and
plugins/memreview from the plugin repo into hermes-agent/plugins/. For
docker-based deploys, install.sh additionally writes a
docker-compose.override.yml with bind mounts so the running container
picks up live edits without an image rebuild.

Why extract:
  - git pull upstream/main on this fork is now trivial again (no merge conflicts)
  - Plugin code can be installed on a vanilla NousResearch fork
  - Spec edits and prompt iterations land in one place

DB at ~/.hermes/memories/memory.db is untouched. Cron jobs in
~/.hermes/cron/jobs.json migrate via install.sh.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant